AITopics | multiple imputation

Collaborating Authors

multiple imputation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LLMs as Implicit Imputers: Uncertainty Should Scale with Missing Information

van Buuren, Stef

arXiv.org Machine LearningMay-14-2026

Large language models (LLMs) are increasingly deployed in settings where the available context is incomplete or degraded. We argue that an LLM generating answers under incomplete context can be viewed as an implicit imputer, and evaluated against a criterion from the multiple imputation (MI) literature: uncertainty should scale with the amount of missing information. We assess this criterion on SQuAD, using a controlled framework in which context availability is varied across five levels. We evaluate two answer-level uncertainty measures that can be estimated from repeated sampling: sampling-based confidence (empirical mode frequency) and response entropy. Confidence fails to reflect increasing missingness: it remains high even as accuracy collapses. Entropy, by contrast, increases with context removal, consistent with the MI analogy, and explains substantially more variance in accuracy than confidence across all evidence levels (quadratic $R^2$ gap up to 0.057). We further introduce a black-box diagnostic $ρ_R(α)$ that estimates the proportion of baseline uncertainty resolved by context level $α$, requiring only repeated sampling with and without context. These results suggest that entropy is a more responsive black-box uncertainty measure than confidence under incomplete context.

information, large language model, natural language, (20 more...)

arXiv.org Machine Learning

2605.13188

Country:

North America > United States (0.30)
Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports > Football (0.49)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

tBayes-MICE: A Bayesian Approach to Multiple Imputation for Time Series Data

Ibenegbu, Amuche, de Micheaux, Pierre Lafaye, Chandra, Rohitash

arXiv.org Machine LearningApr-10-2026

Time-series analysis is often affected by missing data, a common problem across several fields, including healthcare and environmental monitoring. Multiple Imputation by Chained Equations (MICE) has been prominent for imputing missing values through "fully conditional specification". We extend MICE using the Bayesian framework (tBayes-MICE), utilising Bayesian inference to impute missing values via Markov Chain Monte Carlo (MCMC) sampling to account for uncertainty in MICE model parameters and imputed values. We also include temporally informed initialisation and time-lagged features in the model to respect the sequential nature of time-series data. We evaluate the tBayes-MICE method using two real-world datasets (AirQuality and PhysioNet), and using both the Random Walk Metropolis (RWM) and the Metropolis-Adjusted Langevin Algorithm (MALA) samplers. Our results demonstrate that tBayes-MICE reduces imputation errors relative to the baseline methods over all variables and accounts for uncertainty in the imputation process, thereby providing a more accurate measure of imputation error. We also found that MALA mixed better than RWM across most variables, achieving comparable accuracy while providing more consistent posterior exploration. Overall, these findings suggest that the tBayes-MICE framework represents a practical and efficient approach to time-series imputation, balancing increased accuracy with meaningful quantification of uncertainty in various environmental and clinical settings.

artificial intelligence, imputation, machine learning, (19 more...)

arXiv.org Machine Learning

2603.27142

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Oceania > Australia > New South Wales (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Predictive Uncertainty in Short-Term PV Forecasting under Missing Data: A Multiple Imputation Approach

Pashmchi, Parastoo, Benoit, Jérôme, Kanagawa, Motonobu

arXiv.org Machine LearningMar-17-2026

Missing values are common in photovoltaic (PV) power data, yet the uncertainty they induce is not propagated into predictive distributions. We develop a framework that incorporates missing-data uncertainty into short-term PV forecasting by combining stochastic multiple imputation with Rubin's rule. The approach is model-agnostic and can be integrated with standard machine-learning predictors. Empirical results show that ignoring missing-data uncertainty leads to overly narrow prediction intervals. Accounting for this uncertainty improves interval calibration while maintaining comparable point prediction accuracy. These results demonstrate the importance of propagating imputation uncertainty in data-driven PV forecasting.

data quality, imputation, machine learning, (19 more...)

arXiv.org Machine Learning

2603.15564

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.05)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report > New Finding (0.87)

Industry: Energy > Renewable > Solar (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Quality (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

f076073b2082f8741a9cd07b789c77a0-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-11-2026, 01:31:19 GMT

calibration, imputation, reviewer, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.32)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.30)

Add feedback

Imputation Uncertainty in Interpretable Machine Learning Methods

Golchian, Pegah, Wright, Marvin N.

arXiv.org Machine LearningDec-22-2025

In real data, missing values occur frequently, which affects the interpretation with interpretable machine learning (IML) methods. Recent work considers bias and shows that model explanations may differ between imputation methods, while ignoring additional imputation uncertainty and its influence on variance and confidence intervals. We therefore compare the effects of different imputation methods on the confidence interval coverage probabilities of the IML methods permutation feature importance, partial dependence plots and Shapley values. We show that single imputation leads to underestimation of variance and that, in most cases, only multiple imputation is close to nominal coverage.

imputation, imputation uncertainty, mean imputation, (14 more...)

arXiv.org Machine Learning

2512.17689

Country:

Europe > Germany > Bremen > Bremen (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

Beyond Accuracy: An Empirical Study of Uncertainty Estimation in Imputation

Hossain, Zarin Tahia, Milani, Mostafa

arXiv.org Artificial IntelligenceNov-27-2025

Handling missing data is a central challenge in data-driven analysis. Modern imputation methods not only aim for accurate reconstruction but also differ in how they represent and quantify uncertainty. Yet, the reliability and calibration of these uncertainty estimates remain poorly understood. This paper presents a systematic empirical study of uncertainty in imputation, comparing representative methods from three major families: statistical (MICE, SoftImpute), distribution alignment (OT-Impute), and deep generative (GAIN, MIWAE, TabCSDI). Experiments span multiple datasets, missingness mechanisms (MCAR, MAR, MNAR), and missingness rates. Uncertainty is estimated through three complementary routes: multi-run variability, conditional sampling, and predictive-distribution modeling, and evaluated using calibration curves and the Expected Calibration Error (ECE). Results show that accuracy and calibration are often misaligned: models with high reconstruction accuracy do not necessarily yield reliable uncertainty. We analyze method-specific trade-offs among accuracy, calibration, and runtime, identify stable configurations, and offer guidelines for selecting uncertainty-aware imputers in data cleaning and downstream machine learning pipelines.

data quality, imputation, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2511.21607

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Missing Data Multiple Imputation for Tabular Q-Learning in Online RL

Chasalow, Kyla, Wu, Skyler, Murphy, Susan

arXiv.org Machine LearningOct-14-2025

Missing data in online reinforcement learning (RL) poses challenges compared to missing data in standard tabular data or in offline policy learning. The need to impute and act at each time step means that imputation cannot be put off until enough data exist to produce stable imputation models. It also means future data collection and learning depend on previous imputations. This paper proposes fully online imputation ensembles. We find that maintaining multiple imputation pathways may help balance the need to capture uncertainty under missingness and the need for efficiency in online settings. We consider multiple approaches for incorporating these pathways into learning and action selection. Using a Grid World experiment with various types of missingness, we provide preliminary evidence that multiple imputation pathways may be a useful framework for constructing simple and efficient online missing data RL methods.

imputation, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2510.10709

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

f076073b2082f8741a9cd07b789c77a0-AuthorFeedback.pdf

Neural Information Processing SystemsAug-17-2025, 05:31:06 GMT

calibration, imputation, reviewer, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.32)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.30)

Add feedback

Missing value imputation with adversarial random forests -- MissARF

Golchian, Pegah, Kapar, Jan, Watson, David S., Wright, Marvin N.

arXiv.org Machine LearningJul-22-2025

Handling missing values is a common challenge in biostatistical analyses, typically addressed by imputation methods. We propose a novel, fast, and easy-to-use imputation method called missing value imputation with adversarial random forests (MissARF), based on generative machine learning, that provides both single and multiple imputation. MissARF employs adversarial random forest (ARF) for density estimation and data synthesis. To impute a missing value of an observation, we condition on the non-missing values and sample from the estimated conditional distribution generated by ARF. Our experiments demonstrate that MissARF performs comparably to state-of-the-art single and multiple imputation methods in terms of imputation quality and fast runtime with no additional costs for multiple imputation.

artificial intelligence, imputation, machine learning, (16 more...)

arXiv.org Machine Learning

2507.15681

Country:

North America > United States > Wyoming > Albany County > Laramie (0.14)
Europe > Germany > Bremen > Bremen (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Public Health (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

MoCap-Impute: A Comprehensive Benchmark and Comparative Analysis of Imputation Methods for IMU-based Motion Capture Data

Bekhit, Mahmoud, Salah, Ahmad, Alrawahi, Ahmed Salim, Attia, Tarek, Ali, Ahmed, Eldesokey, Esraa, Fathalla, Ahmed

arXiv.org Artificial IntelligenceJul-15-2025

Motion capture (MoCap) data from wearable Inertial Measurement Units (IMUs) is vital for applications in sports science, but its utility is often compromised by missing data. Despite numerous imputation techniques, a systematic performance evaluation for IMU-derived MoCap time-series data is lacking. We address this gap by conducting a comprehensive comparative analysis of statistical, machine learning, and deep learning imputation methods. Our evaluation considers three distinct contexts: univariate time-series, multivariate across subjects, and multivariate across kinematic angles. To facilitate this benchmark, we introduce the first publicly available MoCap dataset designed specifically for imputation, featuring data from 53 karate practitioners. We simulate three controlled missingness mechanisms: missing completely at random (MCAR), block missingness, and a novel value-dependent pattern at signal transition points. Our experiments, conducted on 39 kinematic variables across all subjects, reveal that multivariate imputation frameworks consistently outperform univariate approaches, particularly for complex missingness. For instance, multivariate methods achieve up to a 50% mean absolute error reduction (MAE from 10.8 to 5.8) compared to univariate techniques for transition point missingness. Advanced models like Generative Adversarial Imputation Networks (GAIN) and Iterative Imputers demonstrate the highest accuracy in these challenging scenarios. This work provides a critical baseline for future research and offers practical recommendations for improving the integrity and robustness of Mo-Cap data analysis.

data quality, imputation, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2507.10334

Country:

Africa > Middle East > Egypt (0.68)
Asia > Middle East (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback